Search CORE

116 research outputs found

Universality of Bayesian mixture predictors

Author: Ryabko Daniil
Publication venue
Publication date: 01/11/2016
Field of study

The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set C, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured with asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a convex combination of a countably many measures from the set C (a Bayesian mixture). This was previously only known for the case when the best achievable asymptotic error is 0. This also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal, while there is a predictor that achieves the optimal performance

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

On sample complexity for computational pattern recognition

Author: Ryabko Daniil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2005
Field of study

In statistical setting of the pattern recognition problem the number of examples required to approximate an unknown labelling function is linear in the VC dimension of the target learning class. In this work we consider the question whether such bounds exist if we restrict our attention to computable pattern recognition methods, assuming that the unknown labelling function is also computable. We find that in this case the number of examples required for a computable method to approximate the labelling function not only is not linear, but grows faster (in the VC dimension of the class) than any computable function. No time or space constraints are put on the predictors or target functions; the only resource we consider is the training examples. The task of pattern recognition is considered in conjunction with another learning problem -- data compression. An impossibility result for the task of data compression allows us to estimate the sample complexity for pattern recognition

arXiv.org e-Print Archive

Crossref

Hypotheses testing on infinite random graphs

Author: Ryabko Daniil
Publication venue
Publication date: 10/08/2017
Field of study

Drawing on some recent results that provide the formalism necessary to definite stationarity for infinite random graphs, this paper initiates the study of statistical and learning questions pertaining to these objects. Specifically, a criterion for the existence of a consistent test for complex hypotheses is presented, generalizing the corresponding results on time series. As an application, it is shown how one can test that a tree has the Markov property, or, more generally, to estimate its memory

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Characterizing predictable classes of processes

Author: Ryabko Daniil
Publication venue
Publication date: 01/01/2009
Field of study

The problem is sequence prediction in the following setting. A sequence

x_1,...,x_n,...

of discrete-valued observations is generated according to some unknown probabilistic law (measure)

\mu

. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure

\mu

belongs to an arbitrary class \C of stochastic processes. We are interested in predictors

\rho

whose conditional probabilities converge to the "true"

\mu

-conditional probabilities if any \mu\in\C is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of \C. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

Independence clustering (without a matrix)

Author: Ryabko Daniil
Publication venue
Publication date: 20/03/2017
Field of study

The independence clustering problem is considered in the following formulation: given a set

S

of random variables, it is required to find the finest partitioning

\{U_1,\dots,U_k\}

S

into clusters such that the clusters

U_1,\dots,U_k

are mutually independent. Since mutual independence is the target, pairwise similarity measurements are of no use, and thus traditional clustering algorithms are inapplicable. The distribution of the random variables in

S

is, in general, unknown, but a sample is available. Thus, the problem is cast in terms of time series. Two forms of sampling are considered: i.i.d.\ and stationary time series, with the main emphasis being on the latter, more general, case. A consistent, computationally tractable algorithm for each of the settings is proposed, and a number of open directions for further research are outlined

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Clustering processes

Author: Ryabko Daniil
Publication venue
Publication date: 01/01/2010
Field of study

The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist, under most general non-parametric assumptions. The notion of consistency is as follows: two samples should be put into the same cluster if and only if they were generated by the same distribution. With this notion of consistency, clustering generalizes such classical statistical problems as homogeneity testing and process classification. We show that, for the case of a known number of clusters, consistency can be achieved under the only assumption that the joint distribution of the data is stationary ergodic (no parametric or Markovian assumptions, no assumptions of independence, neither between nor within the samples). If the number of clusters is unknown, consistency can be achieved under appropriate assumptions on the mixing rates of the processes. (again, no parametric or independence assumptions). In both cases we give examples of simple (at most quadratic in each argument) algorithms which are consistent.Comment: in proceedings of ICML 2010. arXiv-admin note: for version 2 of this article please see: arXiv:1005.0826v

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server